131 research outputs found

    Array-based architecture for FET-based, nanoscale electronics

    Get PDF
    Advances in our basic scientific understanding at the molecular and atomic level place us on the verge of engineering designer structures with key features at the single nanometer scale. This offers us the opportunity to design computing systems at what may be the ultimate limits on device size. At this scale, we are faced with new challenges and a new cost structure which motivates different computing architectures than we found efficient and appropriate in conventional very large scale integration (VLSI). We sketch a basic architecture for nanoscale electronics based on carbon nanotubes, silicon nanowires, and nano-scale FETs. This architecture can provide universal logic functionality with all logic and signal restoration operating at the nanoscale. The key properties of this architecture are its minimalism, defect tolerance, and compatibility with emerging bottom-up nanoscale fabrication techniques. The architecture further supports micro-to-nanoscale interfacing for communication with conventional integrated circuits and bootstrap loading

    Unifying mesh- and tree-based programmable interconnect

    Get PDF
    We examine the traditional, symmetric, Manhattan mesh design for field-programmable gate-array (FPGA) routing along with tree-of-meshes (ToM) and mesh-of-trees (MoT) based designs. All three networks can provide general routing for limited bisection designs (Rent's rule with p<1) and allow locality exploitation. They differ in their detailed topology and use of hierarchy. We show that all three have the same asymptotic wiring requirements. We bound this tightly by providing constructive mappings between routes in one network and routes in another. For example, we show that a (c,p) MoT design can be mapped to a (2c,p) linear population ToM and introduce a corner turn scheme which will make it possible to perform the reverse mapping from any (c,p) linear population ToM to a (2c,p) MoT augmented with a particular set of corner turn switches. One consequence of this latter mapping is a multilayer layout strategy for N-node, linear population ToM designs that requires only /spl Theta/(N) two-dimensional area for any p when given sufficient wiring layers. We further show upper and lower bounds for global mesh routes based on recursive bisection width and show these are within a constant factor of each other and within a constant factor of MoT and ToM layout area. In the process we identify the parameters and characteristics which make the networks different, making it clear there is a unified design continuum in which these networks are simply particular regions

    Deterministic Addressing of Nanoscale Devices Assembled at Sublithographic Pitches

    Get PDF
    Multiple techniques have now been proposed using random addressing to build demultiplexers which interface between the large pitch of lithographically patterned features and the smaller pitch of self-assembled sublithographic nanowires. At the same time, the relatively high defect rates expected for molecular-sized devices and wires dictate that we design architectures with spare components so we can map around defective elements. To accommodate and mask both of these effects, we introduce a programmable addressing scheme which can be used to provide deterministic addresses for decoders built with random nanoscale addressing and potentially defective wires. We describe how this programmable addressing scheme can be implemented with emerging, nanoscale building blocks and show how to build deterministically addressable memory banks. We characterize the area required for this programmable addressing scheme. For 2048 x 2048 memory banks, the area overhead for address correction is less than 33%, delivering net memory densities around 10^11 b/cm^2

    Fault-tolerant sub-lithographic design with rollback recovery

    Get PDF
    Shrinking feature sizes and energy levels coupled with high clock rates and decreasing node capacitance lead us into a regime where transient errors in logic cannot be ignored. Consequently, several recent studies have focused on feed-forward spatial redundancy techniques to combat these high transient fault rates. To complement these studies, we analyze fine-grained rollback techniques and show that they can offer lower spatial redundancy factors with no significant impact on system performance for fault rates up to one fault per device per ten million cycles of operation (Pf = 10^-7) in systems with 10^12 susceptible devices. Further, we concretely demonstrate these claims on nanowire-based programmable logic arrays. Despite expensive rollback buffers and general-purpose, conservative analysis, we show the area overhead factor of our technique is roughly an order of magnitude lower than a gate level feed-forward redundancy scheme

    Fault Secure Encoder and Decoder for NanoMemory Applications

    Get PDF
    Memory cells have been protected from soft errors for more than a decade; due to the increase in soft error rate in logic circuits, the encoder and decoder circuitry around the memory blocks have become susceptible to soft errors as well and must also be protected. We introduce a new approach to design fault-secure encoder and decoder circuitry for memory designs. The key novel contribution of this paper is identifying and defining a new class of error-correcting codes whose redundancy makes the design of fault-secure detectors (FSD) particularly simple. We further quantify the importance of protecting encoder and decoder circuitry against transient errors, illustrating a scenario where the system failure rate (FIT) is dominated by the failure rate of the encoder and decoder. We prove that Euclidean geometry low-density parity-check (EG-LDPC) codes have the fault-secure detector capability. Using some of the smaller EG-LDPC codes, we can tolerate bit or nanowire defect rates of 10% and fault rates of 10^(-18) upsets/device/cycle, achieving a FIT rate at or below one for the entire memory system and a memory density of 10^(11) bit/cm^2 with nanowire pitch of 10 nm for memory blocks of 10 Mb or larger. Larger EG-LDPC codes can achieve even higher reliability and lower area overhead

    Performance comparison of single-precision SPICE Model-Evaluation on FPGA, GPU, Cell, and multi-core processors

    Get PDF
    Automated code generation and performance tuning techniques for concurrent architectures such as GPUs, Cell and FPGAs can provide integer factor speedups over multi-core processor organizations for data-parallel, floating-point computation in SPICE model-evaluation. Our Verilog AMS compiler produces code for parallel evaluation of non-linear circuit models suitable for use in SPICE simulations where the same model is evaluated several times for all the devices in the circuit. Our compiler uses architecture specific parallelization strategies (OpenMP for multi-core, PThreads for Cell, CUDA for GPU, statically scheduled VLIW for FPGA) when producing code for these different architectures. We automatically explore different implementation configurations (e.g. unroll factor, vector length) using our performance-tuner to identify the best possible configuration for each architecture. We demonstrate speedups of 3- 182times for a Xilinx Virtex5 LX 330T, 1.3-33times for an IBM Cell, and 3-131times for an NVIDIA 9600 GT GPU over a 3 GHz Intel Xeon 5160 implementation for a variety of single-precision device models

    Optimistic Parallelization of Floating-Point Accumulation

    Get PDF
    Floating-point arithmetic is notoriously non-associative due to the limited precision representation which demands intermediate values be rounded to fit in the available precision. The resulting cyclic dependency in floating-point accumulation inhibits parallelization of the computation, including efficient use of pipelining. In practice, however, we observe that floating-point operations are "mostly" associative. This observation can be exploited to parallelize floating-point accumulation using a form of optimistic concurrency. In this scheme, we first compute an optimistic associative approximation to the sum and then relax the computation by iteratively propagating errors until the correct sum is obtained. We map this computation to a network of 16 statically-scheduled, pipelined, double-precision floating-point adders on the Virtex-4 LX160 (-12) device where each floating-point adder runs at 296 MHz and has a pipeline depth of 10. On this 16 PE design, we demonstrate an average speedup of 6× with randomly generated data and 3-7× with summations extracted from Conjugate Gradient benchmarks

    Seven strategies for tolerating highly defective fabrication

    Get PDF
    In this article we present an architecture that supports fine-grained sparing and resource matching. The base logic structure is a set of interconnected PLAs. The PLAs and their interconnections consist of large arrays of interchangeable nanowires, which serve as programmable product and sum terms and as programmable interconnect links. Each nanowire can have several defective programmable junctions. We can test nanowires for functionality and use only the subset that provides appropriate conductivity and electrical characteristics. We then perform a matching between nanowire junction programmability and application logic needs to use almost all the nanowires even though most of them have defective junctions. We employ seven high-level strategies to achieve this level of defect tolerance

    The Case for Reconfigurable Components with Logic Scrubbing: Regular Hygiene Keeps Logic FIT (low)

    Get PDF
    As we approach atomic-scale logic, we must accommodate an increased rate of manufacturing defects, transient upsets, and in-field persistent failures. High defect rates demand reconfiguration to avoid defective components, and transient upsets demand online error detection to catch failures. Combining these techniques we can detect in-field persistent failures when they occur and reconfigure around them. However, since failures may be logically masked for long periods of time, persistent failures may accumulate silently; this integration of errors over time means the effective failure rate for persistent errors can exceed transient upset rates. As a result, logic scrubbing is necessary to prevent the silent accumulation of an undetectable number of persistent errors. We provide simple analysis to illustrate quantitatively how this phenomena can be a concern

    Stochastic assembly of sublithographic nanoscale interfaces

    Get PDF
    We describe a technique for addressing individual nanoscale wires with microscale control wires without using lithographic-scale processing to define nanoscale dimensions. Such a scheme is necessary to exploit sublithographic nanoscale storage and computational devices. Our technique uses modulation doping to address individual nanowires and self-assembly to organize them into nanoscale-pitch decoder arrays. We show that if coded nanowires are chosen at random from a sufficiently large population, we can ensure that a large fraction of the selected nanowires have unique addresses. For example, we show that N lines can be uniquely addressed over 99% of the time using no more than /spl lceil/2.2log/sub 2/(N)/spl rceil/+11 address wires. We further show a hybrid decoder scheme that only needs to address N=O(W/sub litho-pitch//W/sub nano-pitch/) wires at a time through this stochastic scheme; as a result, the number of unique codes required for the nanowires does not grow with decoder size. We give an O(N/sup 2/) procedure to discover the addresses which are present. We also demonstrate schemes that tolerate the misalignment of nanowires which can occur during the self-assembly process
    corecore